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AGTCCCAGACGGGCTTTTCCCAGAGAGCT7VAAAGAGAAGGGCCAGAGA ATG TCGTCCCAG 
CCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACTCCTATGGCAGCTGGTAC 
ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGCCCTCCTGCCAC 
ACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGCTGTCAATCCTTGTGCTG 
CTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTGACTGTGTGCGTGGCAGG 
CGCGGCCTGCCCAGCCCTGTGGATTTCTTGGCTGGGGACAGGCCCCGGGCAGTGCCTGCT 
GCTGTTTTCATGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTG 
CCCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGG 
GCCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGT 
GCCACGGCTGGCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTT 
GGGGTCCAGGTCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCC7VAGATCTACAAGTACTAC 
TCCCTGCTGGCCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCT 
GTGCAGCTGGTGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGC 
AGCTACTCTGAGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTAC 
CACACCTCCAAGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTAC 



ACGGCCATTTACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAG 
GTGAGGGCAGGGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTC 
TCCGAGGACAAGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTG 
TGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCA 
CTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGT 
CCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGT 
GCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTG 
GGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 
GTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATC 
CTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTGATGATGGACACCCACAGCTG 
ACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTG 
GGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTT 
GGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTAC 
ACGTACCGAAACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTC 
TGCTCCCTGCTCCTGCAAGCGCAGAGCCTCCTACCCAGGAGCATGGCAGCCCCCCAGGAC 
AGCCTGAGACCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATG 
GCCAAGGGAGCTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACG 
CTGCTGCACAACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 
GCCCAGCCCTGAGGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCC 
rGCCTACCATCCTCCTCCCTCCCCGGCTGTCCTCCCAGCATCACACCAGCCATGCAGCCA 
GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTi:;GGAGCCTCAGGAG 
GC-CTCTGCTCCACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAAAAACTG 
GTGGGTTAGGGCCTTGGTCCAGGAGCCAGTTGAGGCAGGGGAGGCACATCCAGGGGTGTC 
CCTACCCTGGCTCTGCCATCAGCCTTGAAGGGCCTCGATGAAGGCTTGTCTGGAACCACT 
CCAGCCCAGCTCCACCTCAGCCTTGGCGTTCACGCTGTGGAAGCAGCCAAGGCACTTCCT 
GACCCGCTCAGGGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGTCCTCTGGC 
r-T -CAGGGCAGGCGAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGA 
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mssqpagnqtspgatedysygswyidepqggeelqpegevpschtsippglyhac lasls 
ilvllllamlvrrrqlwpdcvrgrpglpspvdflagdrpravpaavfmvllsslclllpd 
edalpfltlasapsqdgp:teaprgawkilglfyyaalyyplaacataghtaahllgstls 
wahlgvqvwqraecpqvpfciykyysllaslplllglgflslwypvqlvrsfsrrtgagsk 
glqssyseeylrnllcrp:klgssyhtskhgflswarvclrhciytpqpgfhlplklvlsa 
tltgtaiyqvallllvgvvptiqkvragvttdvsyllagfgivlsedfcqevvelvpchhlw 
alevcyisalvlsclltflvlmrslvthrtnlralhrgaaldlsplhrsphpsrqaifcw 
msfsayqtaficlgllvqqiifflgttalaflvlmpvlhgrnlllfrslesswpfwltla 
lavilqnmaahwvflethdghpqltnrrvlyaatfllfplnvlvgamvatwrvllsalyn 
aihlgqmdlsllppraatldpgyytyrnflkievsqshpamtafcslllqaqsllprtma 
apqdslrpgeedegmqllqtkdsmakgarpgasrgrarwglaytllhnptlqvfrktall 
gangaqp 

Important features of the protein: 
Signal peptide: 



Transmembrane domain: 

54-69 

102-119 

148-166 

207-222 

301-320 

364-380 

431-451 

474-489 

560-535 

Motif file: 

Motif name: N-glycosylation site. 
8-12 

Motif name: N-myristoylation site. 

50-56 

176-182 

241-247 

317-323 

341-347 

525-531 

627-633 

631-637 

640-646 

661-667 



Motif name: A I IVCi IP-binding site motif A (l*-Ioop). 



None 
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PRO XXXXXXXXXXXXXXX (Length = 15 amino acids) 

Comparison Protein XXXXXYYYYYYY (Length = 12 amino acids) 

7o amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
of the PRO polypeptide) = 
5 divided by 15 = 33.3°/o 



FIG..3A 



PRO XXXXXXXXXX (Length = 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length = 15 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
of the PRO polypeptide) = 
5 divided by 10 = 50% 



FIG.-3B 



PRO-DNA NNNNNNNNNNNNNN (Length = 14 nucleotides) 

Comparison DNA NNNNNNLLLLLLLLLL (Length = 16 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences 
as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) = 
6 divided by 14 = 42.9% 



FIG..3C 



PRO-DNA NNNNNNNNNNNN (Length = 1 2 nucleotides) 

Comparison DNA NNNNLLLVV (Length = 9 nucleotides) 

% nucleic acid sequence identity - 



4 divided by 12 = 33.3= 
V 



FIG. _ 3D 
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r 



* C-C increased from 12 to 15 

* Z is average of EQ 

* B is average of ND 

* match with stop is M; stop-stop = 0; J (joker) match 
7 

#define _M -8 /* value of a match with a stop */ 



0 



int 

/* A 
/* A V 
/* B */ 



r 
r 
r 



D 7 
E 7 
F 7 
/* G 7 
/* H 7 
/* I 7 
/* J 7 
/* K 7 
/* L7 
/* M 7 
/* N 7 
/* O 7 

/* P 7 
/* Q 7 
/* R 7 
/* S 7 
/* T7 
/* U 7 
/* V 7 
/* W 7 

r X 7 

/* Y 7 
/* Z 7 

}; 



_day[26][26] = { 

BCDEFGHIJKLMNOPQRSTUVWXYZ7 

2, 0,-2, 0, 0,-4, 1 ,-1 ,-1 , 0,-1 ,-2,-1 , 0,_M, 1 , 0,-2, 1,1,0, 0,-6, 0,-3, 0}, 
0, 3,-4, 3, 2,-5, 0, 1,-2, 0, 0,-3,-2, 2,_M,-1, 1 , 0, 0, 0, 0,-2,-5, 0,-3, 1}^ 
-2,-4,15,-5,-5,-4,-3,-3,-2, 0,-5,-6,-5,-4,_rvl,-3,-5,-4, 0,-2, 0,-2,-8, 0, 0,-t»}, 
0, 3,-5, 4, 3,-6, 1 , 1 ,-2, 0, 0,-4,-3, 2,_M,-1 , 2,-1 , 0, 0, 0,-2,-7, 0,-4, 2}, 

0, 2,-5, 3, 4,-5, 0, 1 ,-2, 0, 0,-3,-2, 1 ,_M,-1 , 2,-1 , 0, 0, 0,-2,-7, 0,-4, 3}, 
-4,-5,-4,-6,-5, 9,-5,-2, 1, 0,-5, 2, 0,-4,„M, -5, -5,-4,-3,-3, 0,-1, 0, 0, 7,-5}, 

1 , 0,-3, 1 , 0,-5, 5,-2,-3, 0,-2,-4,-3, 0,_M,-1 ,-1 ,-3, 1 , 0, 0,-1 ,-7, 0,-5, 0}, 
-1, 1,-3, 1, 1,-2,-2, 6,-2, 0, 0,-2.-2, 2,_M, 0, 3, 2,-1 ,-1 , 0,-2,-3, 0, 0, 2}, 
-1,-2,-2,-2,-2, 1,-3,-2, 5, 0,-2, 2, 2,-2,_M,-2,-2,-2,-1 , 0, 0, 4,-5, 0,-1,-2}, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
-1 , 0,-5, 0, 0,-5,-2, 0,-2, 0, 5,-3, 0, 1 ,_M,-1 , 1 , 3, 0, 0. 0,-2,-3, 0,-4, 0}, 
-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3,_M,-3,-2,-3,-3,-1 , 0, 2,-2, 0,-1,-2}, 
-1,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2,_M,-2,-1 , 0,-2,-1, 0, 2,-4, 0,-2,-1}, 

0, 2,-4, 2, 1,-4, 0, 2,-2, 0, 1,-3,-2, 2,_M,-1, 1 , 0, 1 , 0, 0,-2,-4, 0,-2, 1}, 
_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,„M,0,_M,_M,_M,_M, 



_M,_M,_M,_M,_M,_M,_M}, 
1 ,-1 ,-3,-1 ,-1 ,-5,-1 , 0,-2, 0,-1 ,-3,-2,-1 ,_M, 6, 0, 0, 1 , 0, 0,-1 ,-6, 0,-5, 0}, 

0, 1 ,-5, 2, 2,-5,-1 , 3,-2, 0, 1 ,-2,-1 , 1 , M, 0, 4, 1 ,-1 ,-1 , 0,-2,-5, 0,-4, 3}, 
■2, 0,-4,-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0,_M, 0, 1 , 6, 0,-1, 0,-2, 2, 0,-4, 0}, 

1 , 0, 0, 0, 0,-3, 1 ,-1 ,-1 , 0, 0,-3,-2, 1 ,_M, 1 ,-1 , 0, 2, 1 , 0,-1 ,-2, 0,-3, 0}, 
1 , 0,-2, 0, 0,-3, 0,-1 , 0, 0, 0,-1 ,-1 , 0,_M, 0,-1 ,-1,1, 3, 0, 0,-5, 0,-3, 0}, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
0,-2,-2,-2,-2,-1,-1,-2, 4, 0,-2, 2, 2,-2,_M,-1 ,-2,-2,-1 , 0, 0, 4,-6, 0,-2,-2}, 
■6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2,-4,-4, M,-6,-5, 2,-2,-5, 0,-6,17, 0, 0,-6}, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
-3,-3, 0,-4,-4, 7,-5, 0,-1, 0,-4,-1,-2,-2, M, -5, -4,-4,-3, -3, 0,-2, 0, 0,10,-4}. 
0, 1,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1, 1,„M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, 4} 



V" 



I. 
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/* 



#include 


<stdio.h> 




#include 


<ctype.h> 




#define 


MAXJMP 


16 


#define 


MAXGAP 


24 


•ff ueiine 


JIVI no 


1 no A 


#define 


MX 


4 


#define 


DMAT 


3 


#define 


DMIS 


0 


#define 


DINSO 


8 


#define 


DINS1 


1 


#define 


PINSO 


8 


#define 


PINS1 


4 


struct jmp { 





FIG..4B 



/* max jumps in a diag */ 

r don't continue to penalize gaps larger than this */ 
/* max jmps in an path V 

/* save if there's at least MX-1 bases since last jmp */ 

/* value of matching bases V 

/* penalty for mismatched bases V 

/* penalty for a gap V 

/* penalty per base 7 

/* penalty for a gap V 

/* penalty per residue V 



}; 

struct diag { 



}; 

struct path { 



snon 

unsigned short 



. rft « A \ y Ik 



n[!ViMAjivirj, 
x[MAXJMP]; 



/* size uf JIT 



lur ufcJiy) / 



int 


score; 


/* 


long 


offset; 


/* 


short 


ijmp; 


/* 


struct 


jmpjp; 


/* 



/* base no. of jmp in seq x 
/* limits seq to 2^16 -1 V 



7 



int 

short 
int 



}; 

char 

char 

char 

char 

int 

int 

int 

int 

int 

int 

int 

int 

int 



spc; 

n[JMPS]; 
x[JMPS]; 



"ofile; 

*namex[2]; 

*prog; 

*seqx[2]; 

dmax; 

dmaxO; 

dna; 

endgaps; 
gapx, gapy; 
lenO, Ien1 ; 
ngapx, ngapy; 
smax; 
*xbm: 



/* number of leading spaces 7 

/* size of jmp (gap) 7 

/* loc of jmp (last elem before gap) 7 



/* output file name 7 

/* seq names: getseqs() 7 

/* prog name for err msgs 7 

/* seqs: getseqs() 7 

/* best diag: nw() */ 

r final diag 7 

/* set if dna: main() 

/* set if penalizing end gaps 

/* total gaps in seqs 7 

/* seq lens 7 

/* total size of gaps 7 

/* max score: nw() */ 

/* bitmap for matching 7 



I J- U u t 



char *calloc(). 7nalloc(), *index(), *strcpy() 

char *getseq(), *g__calloc(); 
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/* Needleman-Wunsch alignment program ^ ^ 

FIG.^4C 

* usage: progs filel file2 

* where filel and file2 are two dna or two protein sequences. 

* The sequences can be in upper- or lower-case an may contain ambiguity 

* Any lines beginning with '>' or '<' are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 

* Output is in the file "align. out" 

* The program may create a tmp file in /tmp to hold info about traceback. 

* Original version developed under BSD 4.3 on a vax 8650 
V 

#include "nw.h" 
#include "day.h" 



static _dbval[26] = { 

1,14,2,13,0,0,4,11,0,0,12,0,3,15,0,0,0,5,6,8,8,7,9,0,10,0 

h 

Static _pbval[26] = { 

1, 2|(1«("D"-"A'))|(1«('N"-'A")), 4, 8, 16, 32, 64, 
128, 256, OxFFFFFFF, 1«10, 1«11, 1«12, 1«13, 1«14, 
1«15, 1«16, 1«17, 1«18, 1«19, 1«20, 1«21, 1«22, 
1«23, 1«24, 1«25|(1«('E'-'A'))|(1«('Q'-'A')) 

}; 

main(ac, av) main 
int ac; 
char *av[]; 

{ 

prog = av[0]; 
If (ac != 3) { 

fprintf(stderr,"usage: %s filel file2\n", prog); 

fprintf(stderr,"where filel and file2 are two dna or two protein sequencesAn"); 
fprintf(stderr,"The sequences can be in upper- or lower-case\n"); 
fprintf(stderr,"Any lines beginning with ';' or are ignored\n"); 
fprintf(stderr,"Output is in the file \"align.out\"\n"); 
exit(1); 

} 

namex[0] = av[1]; 
namex[1] = av[2]; 

seqx[0] ^ getseq(namex[0], &lenO); 
seqx[1] = getseq(namex[1], &len1); 
xbm = (dna)? dbval : pbval; 

endgaps = 0; /* 1 to penalize endgaps */ 

ofile ^ "align. out"; /* output file V 



cleanup(O); r unlink any tmp files */ 

} 
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rsAii^^* do the alignment, return best score: main() f 1 w-_*rl-/ 

* dna: values in Fitch and Smith, PNAS, 80, 1382-1386, 1983 

* pro: PAM 250 values 

* When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 

V 

nw() nw 



{ 



char *px, *py; T seqs and ptrs */ 

int *ndely, *dely; /* keep track of dely V 

int ndelx, delx; /* keep track of delx 7 

int *tmp; /* for swapping rowO, rowl */ 

int mis; /* score for each type V 

int insO, ins1; /* insertion penalties 7 

register id; /* diagonal index 7 

register ij; /* jmp index 7 

register *colO, *col1 ; /* score for curr, last row 7 

register xx. yy: /* index into seqs */ 

dx = (struct diag *)g_calloc("to get diags", Ien0+len1+1, sizeof(struct diag)); 

ndely = (int *)g„calloc("to get ndely", leni +1 , sizeof(int)); 
dely = (int *)g„calloc("to get dely", Ien1+1, sizeof(int)); 
colO = (int *)g_calloc("to get colO", Ien1+1, sizeof(int)); 
coll = (int *)g_calloc("to get coll", Ien1+1, sizeof(int)); 
insO - (dna)? DINSO : PINSO; 
insi = (dna)? DINS1 : PINS1; 

smax = -10000; 
if (endgaps) { 

for (colO[0] = dely[0] = -insO, yy = 1 ; yy <^ leni ; yy++) { 
colO[yy] = dely[yy] = colO[yy-1] - insi ; 
ndely[yy] - yy; 

} 

colO[0] - 0; r Waterman Bull Math Biol 84 7 



} 

else 



for (yy = 1 ; yy <= ien1 ; yy++) 
dely[yy] = -insO; 



/* fill in match matrix 

7 

for (px = seqx[0], xx = 1 ; xx <= lenO; px++, xx++) { 
r initialize first entry in col 

7 

if (endgaps) { 

if (XX 1) 

col1[0] ^ delx -(ins0+ins1); 

else 



col1[0] - 0: 
delx = -insO; 
ndelx = 0; 
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"^"^-TtNT >- ...nw 
for (py = seqx[1 ], yy = 1 ; yy <- leni ; py++, yy++) { 
mis = colO[yy-1 ]; 
if (dna) 

mis += (xbm[*px-'A']&xbm[*py-'A'])? DMAT : DMIS; 

else 

mis -h= _day[*px-'A'][*py-'A']; 

r update penalty for del in x seq; 

* favor new del over ongong del 

* ignore MAXGAP if weighting endgaps 

V 

if (endgaps || ndely[yy] < MAXGAP) { 
if (colO[yy] - insO >^ dely[yy]) { 

dely[yy] = colO[yy] - (insO+ins1); 

ndely[yy] - 1 ; 
} else { 

rloK/Fx/x/l - — ine1 • 

ndely[yy]++; 

} 

} else { 

if (colO[yy] - (insO+insI) >= dely[yy]) { 
dely[yy] = colO[yy] - (insO+insI); 
ndely[yy] = 1 ; 

} else 

ndely[yy]++; 

} 

/* update penalty for del in y seq; 

* favor new del over ongong del 

V 

if (endgaps || ndelx < MAXGAP) { 
if (col1[yy-1] - insO >= delx) { 

delx ^ coll [yy-1 ] - (insO+ins1 ); 

ndelx - 1 ; 
} else { 

delx -= ins1 ; 

ndelx++; 

} 

} else { 

if (coll [yy-1] - (insO+ins1) >= delx) { 
delx = col1[yy-1] - (insO+ins1); 
ndelx = 1 ; 

} else 

ndelx++; 

} 



FIG.^4E 
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...nw 



id = XX - yy + leni - 1 ; 

if (mis >= delx && mis >= dely[yy]) 

coll [yy] ^ mis; 
else if (delx >= dely[yy]) { 

coll [yy] = delx; 

ij ^ dx[id].ijmp; 

if (dx[id].jp.n[0] && (!dna || (ndelx >= MAXJMP 

&& XX > dx[id].jp.x[ij]+MX) || mis > dx[id].score-f DINSO)) { 



dx[id].ijmp++; 

if (++ij MAXJMP) { 

writejmps(id); 

ij ^ dx[id].ijmp = 0; 

dx[id]. offset = offset; 

offset += sizeof (struct jmp) + sizeof (offset); 



if (dx[id].jp.n[0] && (!dna || (ndely[yy] >= MAXJMP 



&& XX > dx[id].jp.x[ij]+MX) || mis > dx[id].score+DINSO)) { 
dx[id].ijmp++; 
if (++ij MAXJMP) { 



writejmps(id); 

ij = dx[id].ijmp = 0; 

dx[id]. offset = offset; 

offset += sizeof (struct jmp) + sizeof (offset); 



} 

} 

dx[id].jp.n[ij] = -ndely[yy]; 
dx[id].jp.x[ij] = xx; 
dx[id]. score = dely[yy]; 

} 

if (XX == lenO && yy < leni) { 
/* last col 

V 

if (endgaps) 

col1[yy] insO+ins1*(len1-yy); 
if (cn\^ fwl > smax") ( 



1 



dx[id].jp.n[ij] = ndelx; 
dx[id].jp.x[ij] = xx; 
dx[id]. score = delx; 



} 

else { 



col1[yy] = dely[yy]; 
ij = dx[id].ijmp; 



FIG..4F-1 




P2827R1 



10/35 



} 

} 

} 

if (endgaps && xx < lenO) 

col1[yy-1] -= insO+ins1*(lenO-xx); 
if (col1[yy-1] > smax) { 

smax ^ col1[yy-1]; 

dmax = id; 

} 

tmp colO; colO ^ col1 ; coll = tmp; 

} 

(void) free((char *)ndely); 
(void) free((char *)dely); 
(void) free((char *)colO); 
(void) free((cliar *)col1); 

} 

Page 4 of nw.c 



FIG..4F-2 
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FIC4G 



* print() -- only routine visible outside this module 



* static: 

* getmatO -- trace back best path, count matches: print() 

* pr_align() -- print alignment of described in array p[]: print() 

* dumpblockO - dump a block of lines with numbers, stars: pr„align() 

* nums() -- put out a number line: dumpblock() 

* putlineO put out a line (name, [num], seq, [num]): dumpblock() 

* stars{) - -put a line of stars: dumpblock() 

* stripnameO - strip any path and prefix from a seqname 
V 

#include "nw.h" 
#define SPC 3 

#define P_LINE 256 /* maximum output line V 

#riftfin^ P SPC 3 /* space between name or num and seq */ 

extern __day[26][26]; 

int olen; /* set output line length V 

FILE *fx; /* output file V 

print() print 
{ 

int Ix, ly, firstgap, lastgap; /* overlap */ 

if ((fx = fopen(ofile, "w")) 0) { 

fprintf(stderr,"%s: can't write %s\n", prog, ofile); 
cleanup(l); 

} 

fprintf(fx, "<first sequence: 7oS (length = %d)\n", namex[0], lenO); 
fprintf(fx, "<second sequence: %s (length = %d)\n'\ namex[1], Ien1); 
olen = 60; 
Ix - lenO; 
ly = Ien1 ; 

firstgap = lastgap = 0; 

if (dmax < leni - 1) { /* leading gap in x */ 
pp[0].spc = firstgap = leni - dmax - 1 ; 
ly -= pp[0].spc; 

} 

else if (dmax > leni - 1) { /* leading gap in y */ 
pp[1].spc = firstgap = dmax - (leni - 1); 
Ix -= pp[1].spc; 

} 

if (dmaxO < lenO - 1) { r trailing gap in x 7 
lastgap = lenO - dmaxO -1 ; 
Ix lastgap; 



getmat(lx, ly, firstgap, lastgap); 
pr alignO: 
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FIG..4H 

* trace back the best path, count matches 

V 

static 

getnnat(lx, ly, firstgap, lastgap) getmat 
int Ix, ly; /* "core" (minus endgaps) V 

int firstgap, lastgap; T leading trailing overlap V 



{ 



int nm, iO, i1 , sizO, siz1 ; 

char outx[32]; 

double pet; 

register nO, n1 ; 

register char *pO, *p1 ; 

r get total matches, score 

V 

iO = i1 = sizO ^ sizi = 0; 
pO = seqx[0] + pp[1].spc; 
pi = seqx[1] + pp[0].spc; 
no = pp[1J.spc + 1 ; 
n1 = pp[0].spc + 1 ; 

nm = 0; 

while ( >0 && *p1 ) { 
if (sizO) { 
pi++; 
n1++; 
sizO-; 

} 

else if (siz1) { 
pO++; 
nO+-h; 
sizi—; 



} 

else { 



if (xbmrpO-'A']&xbmrp1-'A']) 

nm++; 
if (nO++ PP[0].x[iO]) 

sizO = pp[0].n[iO++]; 
if (n1++ == pp[1].x[i1]) 

siz1 pp[ 1 ] . n[i 1 ++] ; 

pO++; 
pi++; 



} 



/* pet homology: 

* if penalizing endgaps, base is the shorter seq 

* else, knock off overhangs and take shorter core 

V 

if fondqaps) 



^.;^^L iwu. idoubiejf Hii iaouDieiiA. 
fprintf(fx, "\n"): 

fprtntf(fx, "<%d match%s in an overlap of °od: °o.2f percent sim!larity\n". 
nm (nm — 1^'^ "" ■ "oc;" ly nrt\- 



o 



C 



} 
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fprintf(fx, "<gaps in first sequence: %d", gapx); 
if (gapx) { 

(void) sprintf(outx, " (%d %s°oS)", 

ngapx, (dna)? "base":"residue", (ngapx == 1)? "":"s"); 

fprintf(fx,"°os", outx); 

fprintf(fx, gaps in second sequence: %d", gapy); 
if (gapy) { 

(void) sprintf(outx, " (%d %s°/oS)", 

ngapy, (dna)? "base":"residue", (ngapy 1)? "":"s"); 

fprintf(fx,"°oS", outx); 



.getmat 



} 

if (dna) 



fprintf(fx, 

"\n<score: °'od (match = %d, mismatch = %d, gap penalty = %d + %d per 



hac:o\\n" Qmav HN/IAT n^y1IQ niNQPl HIMQi^- 



else 



fprintf(fx, 

"\n<score: ""od (Dayhoff PAM 250 matrix, gap penalty = %d + %d per 
residue)\n", smax, PINSO, PINS1); 
if (endgaps) 

fprintf(fx, 

"<endgaps penalized, left endgap: %d %s%s, right endgap: °od %s%s\n", 
firstgap, (dna)? "base" : "residue", (firstgap == 1)? "" : "s", 



else 



lastgap, (dna)? "base" : "residue", (lastgap 1)? "" : "s"); 
fprintf(fx, "<endgaps not penalized\n"); 



static nm; 

static Imax; 

static ij[2]; 

static nc[2]; 

static ni[2]; 

static siz[2]: 

static char *ps[2]; 

static char *po[2]; 

static char out[2][P LINE]: 

static char star[P..LINE]; 



/* matches in core -- for checking V 

/* lengths of stripped file names */ 

/* jmp index for a path V 

/* number at start of current line V 

/* current elem number -- for gapping V 

/* ptr to current element V 
/* ptr to next output char slot */ 
r output line V 
r set by starsO V 



r 

* print alignment of described in struct path pp[] 

7 

static 
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int 
int 



nn; 
more; 



/* char count */ 



{ 



register i; 

for (i = 0, Imax = 0; i < 2; i++) { 
nn = stripname(namex[i]); 
if (nn > Imax) 

Imax - nn; 



nc[i] = 1 ; 
ni[i] = 1; 
siz[i] = ij[i] = 0; 
ps[i] = seqx[i]; 
po[i] = out[i]; 



FIG.. 41 -2 
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for (nn = nm - 0, more = 1 ; more; ) { 
for (i = more = 0; i < 2; i++) { 

/* 

* do we have more of this sequence? 

V 

if (!*ps[i]) 

continue; 

more++; 

if (pp[i].spc) { /* leading space */ 

*po[i]++ = ' '; 
pp[i].spc--; 

} 

else if (siz[i]) { /* in a gap */ 
*po[i]++ = '-'; 
siz[i]--; 

} 

else { /* we're putting a seq element 

V 

-po[i] = *ps[i]; 
if (islower(*ps[i])) 

*ps[i] = toupper(*ps[i]); 
po[i]++; 
ps[i]++; 

/* 

* are we at next gap for this seq? 

7 

if (ni[i] == pp[i].x[ij[i]]) { 
/* 

* we need to merge all gaps 

* at this location 

*/ 

siz[i] = pp[i].n[ij[i]++]; 
while (ni[i] pp[i].x[ij[i]]) 
^ siz[i] += pp[i].n[ij[i]++]; 

^ ni[i]++; 

if (++nn == olen || !more && nn) { 
dumpblock(); 
for (i = 0; i < 2: i++) 
po[i] = out[i]; 

nn = 0; 

> > ' 

* dump a block of lines, including numbers, stars: pr_align() 

V 



' pnic;tpr 



for (i = 0: I < 2: + ) 
*po[!]- = '\0': 

FIG.-4J 
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(void) putc('\n', fx); 
for (i = 0; i < 2; i++) { 

if rout[i] && rout[i] !="ir(po[i]) !=■■)){ 

if (i == 0) 

nums(i); 
if (i == 0 && *out[1]) 

stars(); 
putline(i); 

if (i == 0 && *out[1]) 

fprintf(fx, star); 
if(i ==1) 

nunns(i); 



} 

/* 



} 



} 



put out a number line: dumpblock() 



V 

nums(ix) 
int 

{ 



ix; /* index in out[] holding seq line 7 
nline[P_LINE]; 



char 
register 

register char *pn, *px, *py; 

for (pn = nline, i = 0; i < lmax+P„SPC; i++, pn++) 
*pn = ' 

for (i = nc[ix], py = out[ix]; *py; py++, pn++) { 
if (*py =="|rpy 
*pn = ' '; 

else { 

If (i%10 == 0 II (1 == 1 && nc[ix] != 1)) { 
j = (i < 0)? -i : i; 
for (px = pn; j; j 10, px--) 

*px = j%10 + '0'; 
if (i < 0) 

*px = '-'; 

} 

else 

*pn = ' '; 

) ' 

*pn - '\0'; 
nc[ix] = i; 

for (pn = nline; *pn; pn++) 

(void) putc(*pn, fx); 
(void) putc('\n', fx); 



.dumpblock 



nums 



viatic 

putline(ixj pulline 
int ix: 

{ 
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...putline 



int 

register char 



i; 



*px; 



for (px = namex[ix], i = 0; *px && *px != ':'; px++, i++) 

(void) putc(*px, fx); 
for (; i < lmax+P_SPC; i++) 

(void) putcC fx); 

/* these count from 1 : 

* ni[] is current element (from 1) 

* nc[] is number at start of current line 

7 

for (px := out[ix]; *px; px++) 

(void) putc(*px&0x7F, fx); 
(void) putc('\n', fx); 



* put a line of stars (seqs always in out[0], out[1]): dumpblockO 

V 

static 



if (!*out[0] II (*out[0] " && *(po[0]) ' ') II 



!*out[1] II (*out[1] == " && *(po[1]) == ' ")) 



return; 

px = star; 

for (i = lmax+P_SPC; i; i--) 
*px-h+ = ' '; 

for (pO ^ out[0], pi = out[1]; *pO && *p1; pO++, p1++) { 
if (isalpha(*pO) && isalpha(*p1 )) { 



if (xbmrpO-'A'l&xbm^pl-'A']) { 

cx = '*'; 
nm++; 

} 

else if (Idna && _dayrpO-'A'][*p1-'A'] > 0) 
cx = '.'; 

else 



} 



stars() 
{ 



stars 



int i 
register char 



*pO, *p1 , cx, *px; 



} 



else 



cx = ' 



} 



FIG.-4L 
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/* 

* strip path or prefix from pn, return len: pr_align() 

*/ 

static 

stripname(pn) stripname 
char *pn; /* file name (may be path) */ 

{ 

register char *px, *py; 

py = 0; 

for (px = pn; *px; px++) 
if (*px == 7') 

py = px + 1 ; 

if (py) 

(void) strcpy(pn, py); 
return(strien(pn)); 

} 

Page 7 of nwprint.c 
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/* tmp file for jmps */ 
/* cleanup tmp file */ 



* cleanupO -- cleanup any tmp file 

* getseqO -- read in seq, set dna, len, maxlen 

* g_calloc() -- callocO with error checkin 

* readjmpsO -- get the good jmps, from tmp file if necessary 

* writejmpsO - write a filled array of jmps to a tmp file: nw() 
7 

#include "nw.h" 
#include <sys/file.h> 

char *jname = Vtmp/homgXXXXXX"; 
FILE *fj; 

int CleanupO; 
long IseekO; 

r 

* remove any tmp file if we blow 

V 

cleanup(i) 

int i; 

^ if (fj) 

(void) unlink(jname); 

exit(i); 

} 

r 

* read, return ptr to seq, set dna, len, maxlen 

* skip lines starting with '<', or '>' 

* seq in upper or lower case 

char * 

getseq(file, len) 
char 
Int 



cleanup 



{ 



char 

register char 
int 

FILE 



*file; r file name 7 
*len; /* seq len 7 

line[1024], *pseq; 
*px, *py; 
natgc, tien; 
*fp; 



getseq 



if ((fp = fopen(file,"r")) == 0) { 

fprintf(stderr,"%s: can't read %s\n", prog, file); 
exit(1); 

} 

tIen = natgc = 0; 

while (fgets(line, 1024, fp)) { 

if f line ';' || *line '< || "line >') 

continue; 
for (px = line; *px != '\n'; px++) 

if (isupper(*px) || islower(*px)) 
tlen++; 

} 



} 

pseq[0] = pseq[1] ^ pseq[2] = pseq[3] = '\0' 
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.getseq 



py = pseq + 4; 
*len = tien; 
rewind(fp); 

while (fgets(line, 1024, fp)) { 

if (*line == ';' || *line '<' || *line == '>') 

continue; 
for (px = line; *px != '\n'; px++) { 
if (isupper(*px)) 

*py++ - *px; 
else if (islower(*px)) 

*py+-f = toupper(*px); 
if (index("ATGCU",*(py-1))) 
natgc++; 

> ' 

*py++ - '\0'; 
*py = '\0'; 
(void) fclose(fp); 

_ ^^+^^ ^ /+i^«/o\. 

return(pseq+4); 

} 

char * 

g_calloc(msg, nx, sz) g calloc 

char *msg; /* program, calling routine */ 

int nx, sz; /* number and size of elements V 

{ 

char *px, *calloc(); 

if ((px = caltoc((unsigned)nx, (unsigned)sz)) 0) { 
if (*msg) { 

fprintf(stderr, "%s: g_calloc() failed %s (n=^%d, sz=%d)\n", prog, msg, 
nx, sz); 
exit(1); 

} ^ 
return(px); 

} 

/* 

* get final jmps from dx[] or tmp file, set pp[], reset dmax: main() 

V 

readjmpsO readjmps 
{ 

int fd = -1; 

int stz, iO, i1 ; 

register i, j, xx; 

if (fj) { 

(void) fclose(fj); 

if ((fd = open(jname, O^RDONLY, 0)) < 0) { 

fprintffstderr ran't ooppM oc.q^n" pmo in^^mpV 



while ( 1 ) { 

for (j = dx[dmax].ijmp; j 0 && dx[dmax].jp.x|j] >= xx; j--) 
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...readjmps 

if (j < 0 && dx[clmax]. offset && fj) { 

(void) lseek(fcl, dx[dmax]. offset, 0); 

(void) read(fd, (char *)&dx[dmax].jp, sizeof(struct jmp)); 
(void) read(fd, (char *)&dx[dmax]. offset, 
sizeof(dx[dmax]. offset)); 
dx[dmax].ijmp - MAXJMP-1 ; 

} 

else 

break; 

} 

if (i >- JMPS) { 

fprintf(stderr, "%s: too many gaps in alignment\n", prog); 
cleanup(1 ); 

} 

/: ^ n\ r 

siz dx[dmax].jp.n[j]; 
XX = dx[dnnax].jp.x|j]; 
dmax += siz; 

if (siz < 0) { /* gap in second seq V 

pp[1].n[i1] = -siz; 
XX += siz; 

/* id = XX - yy + leni - 1 

pp[1].x[i1] = XX - dnnax + leni - 1 ; 
gapy++; 
ngapy -= siz; 
/* ignore MAXGAP when doing endgaps V 

siz = (-siz < MAXGAP || endgaps)? -siz : MAXGAP; 
i1++; 

} 

else if (siz > 0) { /* gap in first seq V 
pp[0].n[iO] = siz; 
pp[0].x[iO] = xx; 

gapx++; 
ngapx +^ siz; 
/"^ ignore MAXGAP when doing endgaps */ 

siz - (siz < MAXGAP || endgaps)';^ siz : MAXGAP: 
iO++; 

} 

} 

else 

break; 

} 



FIG..4P-1 



^ P2827R1 



u 



22/35 

/* reverse the order of jmps 

V 

for (j = 0, iO--; j < iO; j++, iO--) { 

i = pp[0].n[j]; pp[0].n[j] - pp[0].n[iO]; pp[0].n[iO] = i; 
i = pp[0].x[j]; pp[0].x[j] = pp[0].x[iO]; pp[0].x[iO] = i; 

} 

for (j = 0, i1--; j < i1; i1--) { 

i = pp[1].nD]; pp[1].nU] = PP[1].n[i1]; pp[1].n[i1] = i; 
i = pp[1].xD]; pp[1].x[j] = pp[1].x[i1]; pp[1].x[i1] = i; 

} 

if (fd >= 0) 

(void) close(fd); 

if (fj) { 

(void) unlink(jname); 
fj = 0; 
offset = 0; 

Page 3 of nwsubr.c 
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r 

* write a filled jmp struct offset of the prev one (if any): nw() 

V 

writejmps(ix) writejmps 
int ix; 

{ 

char *mktennp(); 
if (!fj) { 

if (nnktemp(jname) < 0) { 

fprintf(stderr, "%s: can't mktempO %s\n", prog, jname); 
cleanup(1); 

} 

if ((fj = fopen(jname, "w")) == 0) { 

fprintf(stderr, "%s: can't write %s\n", prog, jname); 
exit(1); 

} 

} 

(void) fwrite((char *)&dx[ix].jp, sizeof(struct jmp), 1, fj); 
fvoid'i fwriteffchar ^^XdxHxl offset sizeofCdxrixl offRpfi 1 fiV 
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GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTG 
!3AAGTGTGCTACATCTCA(^CCTTGGTCTTGTCGTGGTTAGTCACCTTGCTGGTCCTGATG 
(:GCTCACTi:;GTGACACAGAGGACCAACGTTG(3AGGTGTGGAGGGAGGAGGTGGCCTGGAC 
TTGAGTGGGTTGCATCGGAGTCCCCATCCCTCCCGCCAAGGGATATTGTGTTGGATGAGG 
TTGAGTGCCTACCA(3ACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATGATGTTG 
TTCCTGGGAACGAGGGCCCTGGCCTTGCTGGTGGTCATGGGTGTGGTCCATGGGAGGAAG 
CTGCTGCTGTTGGGTTGGGTGGAGTCGTGGTG'^GGGTTGTGGCTGACTTTGGGCCTGGCT 
(:^TGATf:CTGCAGAAGATGGCAGGGCATTGGGTGTTGCTGGAGACTt:ATGAT(3GACAG(:CA 
CAGCTGACCAACCGGGGAGTGCTCTATGCAGGCA(:GTTTGTTGTGTTGCGCCTCAATGTG 
CTGGTGGGTGCCATGGTGGCCACCTGGGGAGTGCTGCTGTGTGCGCTGTACAAGGCCATC 

caccttggccagatggacctcagcctgctgccaccgagagccgccactctcgaccccggc 
tactacacgtaccgaa 
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cacaaccagccacccctctaggatcccagcccagctggtgctgggctcagaggagaaggc 
cgcgtgttgggagcagggtgcttg::gtggagggagaagtttcggggagagatcaataaag 
gaaaggaaagagagaaggaagggagaggtgaggagaggggttgattggaggagaaggggg 

AGAGA ATG TGGTCGGAGGGAGGAGGGAAGGAGAGGTCGGCCGGG':^GGACAGAGGACTAGT 
CCTATGGGAGGTGGTACATCGATGAGGGGGAGGGGGGGGAGGAGCTCGAGGGAGAGGGGG 

aagtggcctcgtgccacaccaggatacgacgcggggtgtagcacgcctgcctggcgtcgc 

TGTGAATCCTTGTGCTGCTGCTGCTGGGGATGGTGGTGAGGCGGCGCCAGCTCTGGGCTG 

actgtgtgcgtgggaggcccggggtggcgagggcccggggagtgcgtgctgctgttttca 

TGGTGGTCCTGAGCTCCGTGTGTTTGtGTGGTCGCGGAGGAGGACGGATTGGGGTTCGTGA 
CTGTGGCGTCAGGACCCAGCGAAGATGGGAAAACTGAG'GGTCCAAGAGGGGGCTGGAAGA 
TACTGGGACTGTTCTATTATGCTGCCGTCTAGTACGGTCTGGCTGCCTGTGCCACGGCTG 
GCCACACAGCTGCACACCTGGTCGGCAGGAGGCTGTCGTGGGCGCACCTTGGGGTCCAGG 
TCTGGCAGAGGGCA(^AGTGTGCGGAGGTGGGGAAGATGTAGAAGTAGTACTGGCTGCTGG 
CCTGCCTGCCTGTCGTGCTGGGGGTCGGATTGGTGAGGGTTTGGTACCCTGTGCAGCTGG 
TGAGAAGGTTCAGCGGTAGGACAGGAGGAGGGTCCAAGG(:^GGTGGAGAGGAGGTACTCTG 

7\ t'^ 7\ A 7\ 'V ("^ 7\ 7\ 7\ t"^ <T^ '"^ T\ -"^ i"^ A A'^A7\;'^''~'T'.^'~'''^A 7\r^.'^7\r^r'*'~r'T\r^r^Ar^7\n<r^T'J^r^r\ 

i 1 ^ i ij 1 X J. i J. J. i v_j £ ij i X X X ■_; ^ ■ J rir^ -^ i~i.r^-^ j. -J rir-v -^^ w x i w '^/~v >^ x >^ -.^ 

AGCATGGCTTCCTGTCCTGGGCCGGGGTCTGGTTGAGACACTGGATCTACAGTCCACAGG 
CAGGATTCCATGTCCGGCTGAAGCTGGTGCTTTCAGCTACAGTGACAGGGACGGCCATTT 
ACCAGGTGGCCCTGCTGCTGCTGGTGGGGGTGGTACCGACTATCCAGAAGGTGAGGGCAG 
GGGTGACCACGGATGTCTGCTAGGTGCTGGCGGGCTTTGGAATCGTGCTCTCCGAGGACA 
AGCAGGAGGTGGTGGAGGTGGTGAAGGAGGATCTGTGGGCTCTGGAAGTGTGCTACATCT 
CAGGCTTGGTGTTGTGCTGCTTAGTCAGCTTGCTGGTCGTGATGCGCTGACTGGTGACAC 
AGAGGACCAACCTTCGAGCTCTGCAGGGAGGAGCTGCCGTGGAGTTGAGTCCCTTGCATG 
GGAGTCCCCATGGCTCCCGCCAAGCCATATTCTGTTGGATGAGGTTCAGTGCCTACCAGA 
CAGGCTTTATCTGGCTTGGGCTCCTGGTGCAGCAGATGATCTTCTTCGTGGGAACCACGG 
GCGTGGCCTTCCTGGTGCTCATGCGTGTGGTGGATGGGAGGAAGCTCCTGGTCTTCCGTT 
CCCTGGAGTCGTGGTGGCCCTTCTGGGTGAGTTTGGGGCTGGCTGTGATCGTGCAGAACA 
TGGCAGGCCATTGGGTGTTGGTGGAGACTGATGATGGACAGGGACAGCTGACCAAGCGGG 
GAGTGGTGTATGGAGCCAGCTTTGTTGTGTTCCCCCTGAATGTGGTGGTGGGTGCGATAG 
TGGCGACCTGGGGAGTGCTCCTCTCTGGCCTCTAGAAGGGGATGGAGGTTGGCCAGATGG 
ACCTGAGCCTGGTGCGACCGAGAGCGGGGACTCTCGAGGGGGGCTACTACACGTACCGAA 
ACTTGTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTCTGCTCCCTGC 
TCCTGCAAGCGCAGAGCGTCCTACCCAGGACCATGGCAGCCCCCCAGGAGAGCCTCAGAC 
GAGGGGAGGAAGAGGAAGGGATGCAGGTGCTACAGACAAAGGAGTCCATGGCCAAGGGAG 

ctaggggcgggggcagcggcggcaggggtggcti:;gggtgtggggtagagggtggtggaga 

A :CCAAGGGTGCAGGTCTTCGGGAAGAG'G.i3CCCTGTTGGG^TGGGAATGGTGCCCAGGCGT 
GAGGGGAGGGAAG-GTCAA'^CGAGCTGGGGATCTGTi^CT ^AGGGATGTTCGTGCCTACCAG 
CTGGTGGGTGGGCGGCTGTCGTCGCAGCATCACACGAGGCATGCAGGGAGGAGGTCCTCG 
GGATGAGTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGGGTCAGGAGGGGTGTGCTCG 
ACCCAGTTGGGTATGGGAGAGCCAGGAGt^GGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 
CGTTGGTCCAGGAGGGAGTTGAGCCAGGGCAGCCAGATCCAGGCGTCTCGGTAGCCTGGG 
TCTGGGATGAGGCTTGAAGGGGGTCGATGAAGGCTTGTGTGGAAGGAI3TGGAGGCCAGCT 
GCAGGTCAGGCTTGGCCTTCACGCTGTGGAAGCAGCCAAi^Gi^AGTTGGTGAGGGCCTCAG 
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mssqpagnqtspgatedysygswyi[)epqggeelqpegevpschtsippglyhaclasl 
silvllllamlvrrrqlwpdcvrgre'glprpravpaavfmvllsslclllpdedalpfl 
tlasapsqdgkteaprgawp:ilglfyyaalyyplaacataghtaahllg3tlswahlgv 
qvwqraecpqvpr:iy?;yysi,laslplllglgflslwypvqlvrsfsrrtg,agskglqss 
ysesylpnllcrr:ki,gssyhtsp:hgflswarvclrhciytpqpgfhlplklvlsatltg 
taiyqvallllvgvvptiqpcvragvttdvsyllagfgivlsedkqevvelvkhhlwale 
vcyi salvlsclltflvlmrslvthrtnlralhrgaaldlsplhrsphpsrqai fcwms 
fsayqtaficlgllvqqi rfflgttalaflvlmpvlhgrnlllfrslesswpfwltlal 
avilqnmaahwvflethdghpqltnrrvlyaatfllfplnvlvgaivatwrvllsalyn 
aihlgqmdlsllppraatldpgyytyrnflkievsqshpamtafcslllqaqsllprtm 
aapqdslrpgeedegmqllqtkdsmapcgarpgasrgrarwglaytllhnptlqvfrkta 
llgangaqp 

Important features of the protein: 

Signal peptide: 

none 

Transmembrane domain: 

54-71 

93-111 

140-157 

197-214 

291-312 

356-371 

425-444 

464-481 

505-522 

Motif name: N-glycosylation site. 
8-12 

Motif name: N-myristoylation site. 

50-56 

167-173 

232-238 

308-314 

332-338 

516-522 

618-624 

622-628 

631-637 



Motif name 



A 1 P;G rP-bmding site motif A (P-liiop) 



12^ ni 



FIG. 7 



P2827R1 

26 / 35 
Stra6 Variant Clones 



Mouse 



89-97 
SPVDFLAGD 



Human 
148380 



528 



G/A Polymorphism 



M 



527 



670 



)>74% 
Identity 



667 



Human 
148389 



I 



9 aa 
Deletion 



518 



658 



Hydropathy 



Hydrophobicity Plot of Human Stra6 




3 kb mRNA 

Amino A^iHc- cno doo;h, .oo uj. .H,— ...i^^.i 



PTo 
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Bone 

Brain Heart Kidney Liver Lung Trachea Marrow Colon 




Small Skeletal 
Breast Spleen Stomach Thymus Intestine Prostate Muscle Testis Uterus 
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Stra6 RNA Expression in Human Colon Tumor Tissue 
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Carcinoma Cells + / - Retinoic Acid 

TM #75 (2/28/00) VD3 - Vitamin D3 (l^iM); 
ATRA - All-Trans-Retinoic Acid (1 |uM); 9cRA- 9-Cis-Retinoic Acid (1|^M) 
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Stra6 Peptide Expression in E. coli 
Poly-His Cleavable Leader at N-Terminus 
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